Target recognition method and appartus, storage medium, and electronic device

ABSTRACT

A method for identifying a target, a non-transitory computer-readable storage medium, and an electronic device include: acquiring a first image and a second image, the first image and the second image each including a target to be determined; generating a prediction path based on the first image and the second image, both ends of the prediction path respectively corresponding to the first image and the second image; and performing validity determination on the prediction path and determining, according to a determination result, whether the targets to be determined in the first image and the second image are the same target to be determined.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a continuation of U.S. patent application Ser. No. 16/565,069filed on Sep. 9, 2019, which is a continuation of International PatentApplication No. PCT/CN2018/097374 filed on Jul. 27, 2018, which claimspriority to Chinese Patent Application No. 201710633604.3 filed on Jul.28, 2017. The disclosures of the above-referenced applications areincorporated herein by reference in their entirety.

BACKGROUND

Vehicle re-identification, such as car re-identification, is importantcontent in the field of computer vision and public safety, and has greatpotential applications in many aspects such as vehicle detection andtracking, travel route estimation, and abnormal behavior detection.

SUMMARY

Embodiments of the present disclosure relate to the technical field ofartificial intelligence, and in particular to a method and an apparatusfor identifying a target, a non-transitory storage medium, and anelectronic device, and provide technical solutions for targetidentification.

According to a first aspect of the embodiments of the presentdisclosure, a method for identifying a target is provided. The methodincludes: acquiring a first image and a second image, the first imageand the second image each including a target to be determined;generating a prediction path based on the first image and the secondimage, both ends of the prediction path respectively corresponding tothe first image and the second image; and performing validitydetermination on the prediction path and determining, according to adetermination result, whether the targets to be determined in the firstimage and the second image are the same target to be determined.

According to a second aspect of the embodiments of the presentdisclosure, an apparatus for identifying a target is provided. Theapparatus includes: an acquisition module configured to acquire a firstimage and a second image, the first image and the second image eachincluding a target to be determined; a generation module configured togenerate a prediction path based on the first image and the secondimage, both ends of the prediction path respectively corresponding tothe first image and the second image; and a first determination moduleconfigured to perform validity determination on the prediction path anddetermine, according to a determination result, whether the targets tobe determined in the first image and the second image are the sametarget to be determined.

According to a third aspect of the embodiments of the presentdisclosure, a non-transitory computer-readable storage medium isprovided, and has computer program instructions stored thereon, wherewhen the program instructions are executed by a processor, steps of themethod for identifying a target according to the first aspect of theembodiments of the present disclosure are implemented.

According to a fourth aspect of the embodiments of the presentdisclosure, an electronic device is provided, and includes: a processor,a memory, a communication element, and a communication bus, where theprocessor, the memory, and the communication element communicate withone another by means of the communication bus; and the memory isconfigured to store at least one executable instruction, and theexecutable instruction causes the processor to execute steps of themethod for identifying a target according to the first aspect of theembodiments of the present disclosure.

According to the technical solutions provided by the embodiments of thepresent disclosure, a prediction path through which the targets to bedetermined may pass is generated based on information contained in thefirst image and the second image; and whether the targets to bedetermined in the first image and the second image are the same isdetermined by performing validity determination on the prediction path.The validity determination is determination of a possibility whether thecurrent prediction path will be the travel route of the same target tobe determined. The higher the possibility is, the higher the possibilityof the targets to be determined in the first image and the second imagebeing the same target to be determined is. Thus, whether targets to bedetermined in different images are the same target to be determined canbe detected and identified more accurately.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic flowchart of a method for identifying a targetaccording to Embodiment I of the present disclosure;

FIG. 2 is a schematic flowchart of a method for identifying a targetaccording to Embodiment II of the present disclosure;

FIG. 3 is a schematic flowchart of a method for identifying a targetaccording to Embodiment III of the present disclosure;

FIG. 4 is a structural block diagram of an apparatus for identifying atarget according to Embodiment IV of the present disclosure;

FIG. 5 is a structural block diagram of an apparatus for identifying atarget according to Embodiment V of the present disclosure;

FIG. 6 is a structural block diagram of an apparatus for identifying atarget according to Embodiment VI of the present disclosure; and

FIG. 7 is a schematic structural diagram of an electronic deviceaccording to Embodiment VII of the present disclosure.

DETAILED DESCRIPTION

The specific implementations of the embodiments of the presentdisclosure are further described in detail below with reference to theaccompanying drawings (the same reference numerals in severalaccompanying drawings represent the same elements) and the embodiments.The following embodiments are intended to illustrate the presentdisclosure, but are not intended to limit the scope of the presentdisclosure.

Persons skilled in the art can understand that the terms “first”,“second” and the like in the embodiments of the present disclosure areonly used to distinguish different steps, devices or modules, etc., anddo not represent any specific technical meaning or inevitable logicalsequence therebetween.

Most vehicle re-identification technologies are based on appearanceinformation of vehicles. Unlike pedestrian re-identification, thedifficulty in performing vehicle re-identification by simply using theappearance information of vehicles is: many vehicles have the similarappearances (such as color, model, shape, etc.). Especially, thedifferences will be even less in different vehicles of the same brandand the same style. For detection and identification depending onidentification information of vehicles such as license plate informationof cars, decorations in vehicles such as decorations in cars, and otherunique details, the robustness of detection and identification maybecome weaker due to poor viewing angles of surveillance cameras, poorlighting conditions, blurred lenses, and other factors, resulting ininaccurate detection and identification results.

Embodiment I

FIG. 1 is a schematic flowchart of a method for identifying a targetaccording to Embodiment I of the present disclosure. As shown in FIG. 1,the method for identifying a target in this embodiment includes thefollowing steps:

In step S102, a first image and a second image are acquired.

In a specific implementation, from the content contained in the images,the first image and the second image each include a target to bedetermined. From the types of the images, the first image and the secondimage may both be static images captured, or video images in a videoframe sequence, and the like. Specifically, the targets to be determinedinclude a pedestrian, an unmanned aerial vehicle, a vehicle, and thelike. It should be understood that this embodiment is not limitedthereto, and any movable object is contained in the range of the targetsto be determined.

In step S104, a prediction path is generated based on the first imageand the second image.

Both ends of the prediction path respectively correspond to the firstimage and the second image. In the embodiments of the presentdisclosure, travel routes of the targets to be determined may bepredicted based on feature information of the targets to be determinedcontained in the first image and the second image and spatiotemporalinformation contained in the first image and the second image, and thereliability of identification of the targets to be determined isenhanced by means of the route prediction results. Specifically, basedon the information contained in the first image and the second image, itis necessary to further find possible travel routes of the targets to bedetermined in the images, where images of the targets to be determinedcaptured on the travel routes should be spatiotemporally related to thefirst image and the second image.

In step S106, validity determination is performed on the predictionpath, and whether the targets to be determined in the first image andthe second image are the same target to be determined is determinedaccording to a determination result.

The validity determination is determination of a possibility whether aprediction path will be the travel route of the same target to bedetermined. The higher the possibility is, the higher the possibility ofthe targets to be determined in the first image and the second imagebeing the same target to be determined is, that is, the higher thepossibility of the target to be determined in the first image being thesame as the target to be determined in the second image is. In aspecific implementation, the result of the validity determination mayspecifically be a validity probability, or may directly be “valid ornot.”

According to the method for identifying a target provided by thisembodiment, a prediction path through which the targets to be determinedmay pass is generated based on information contained in the first imageand the second image; and whether the targets to be determined in thefirst image and the second image are the same is determined byperforming validity determination on the prediction path. The validitydetermination is determination of a possibility whether the currentprediction path will be the travel route of the same target to bedetermined. The higher the possibility is, the higher the possibility ofthe targets to be determined in the first image and the second imagebeing the same target to be determined is. Thus, whether targets to bedetermined in different images are the same target to be determined canbe detected and identified more accurately.

The method for identifying a target in this embodiment is performed byany appropriate device having image or data processing capabilities,including but not limited to: a camera, a terminal, a mobile terminal, aPersonal Computer (PC), a server, an in-vehicle device, an entertainmentdevice, an advertising device, a Personal Digital Assistant (PDA), atablet computer, a laptop computer, a handheld game console, smartglasses, a smart watch, a wearable device, a virtual display device, ora display enhancement device (such as Google Glass, Oculus Rift,Hololens, Gear VR), and the like.

Embodiment II

Referring to FIG. 2, a schematic flowchart of a method for identifying atarget according to Embodiment II of the present disclosure is shown.

In this embodiment, the method for identifying a target in theembodiments of the present disclosure is described by taking a vehiclebeing a target to be determined as an example. However, persons skilledin the art should understand that in practical application,corresponding target identification operations can be implemented forother targets to be determined with reference to this embodiment.

The method for identifying a target in this embodiment includes thefollowing steps:

In step S202, a first image and a second image are acquired.

In a specific implementation, the first image and the second image eachinclude a target to be determined, and the target to be determined is avehicle.

In step S204, the prediction path of the targets to be determined isgenerated by means of a probability model according to the featureinformation of the first image, the temporal information of the firstimage, the spatial information of the first image, the featureinformation of the second image, the temporal information of the secondimage, and the spatial information of the second image.

Compared with a pedestrian's travel route, the travel routes of vehiclesare more stable and more regular, and the accuracy of determination andidentification is higher. Therefore, the travel routes of the vehiclesmay be predicted by using the feature information of the vehicles (whichcan characterize the appearances of the vehicles) together with thespatiotemporal information in the images, and the reliability of vehicleidentification can be enhanced by means of the route prediction results.

The temporal information of the image is configured to indicate the timeat which the image is captured, and said time may be regarded as thetime at which the target to be determined (such as a vehicle) passes thephotographing device. The spatial information of the image is configuredto indicate the position where the image is captured, and said positionmay be regarded as the position where the photographing device islocated, or may also be regarded as the position where the target to bedetermined such as the vehicle is located when being photographed. Thefeature information of the image is configured to indicate features ofthe target to be determined in the image, such as features of thevehicle; according to the features, the appearance and other informationof the vehicle can be determined. It can be understood that informationcontained in the images involved in this embodiment includes, but is notlimited to, temporal information of the images, spatial information ofthe images, and feature information of the images.

In a specific implementation, the probability model is an MRF.

A random field may be regarded as a set of random variablescorresponding to the same sample space. In general, if there aredependencies between the random variables, the random field isconsidered to have practical significance. The random field includes twoelements, i.e., site and phase space. When a value of the phase space israndomly assigned to each site according to a certain distribution, thewhole is called a random field.

An MRF is a random field having a Markov property. The Markov propertyrefers to that when a random variable sequence is sequentially arrangedin time order, the distribution characteristics at an (N+1)^(th) momentare independent of the values of random variables before the N^(th)moment. One MRF corresponds to one undirected graph. Each node on theundirected graph corresponds to a random variable, and an edge betweennodes indicates a probability dependency between random variablescorresponding to the nodes. Therefore, the structure of MRF essentiallyreflects a priori knowledge, that is, which variables have dependenciestherebetween that need to be considered and which can be ignored.

In this embodiment, at least one prediction path of the targets to bedetermined in the first image and the second image may be generated bymeans of an MRF, and then an optimal path is determined from the atleast one prediction path as the prediction path of the targets to bedetermined. Specifically, the prediction path of the targets to bedetermined may be generated by means of the MRF according to the featureinformation of the first image, the temporal information of the firstimage, the spatial information of the first image, the featureinformation of the second image, the temporal information of the secondimage, and the spatial information of the second image. In oneembodiment, all images including information of the targets to bedetermined and having a spatiotemporal sequence relationship with thefirst image and the second image may be determined from an acquiredimage set by means of a chain MRF; and the prediction path of thetargets to be determined is generated according to temporal informationand spatial information corresponding to all the determined images.

Spatiotemporal data refers to data that has both temporal and spatialdimensions, including information in temporal and spatial dimensions. Ingeography, since continuous spatiotemporal data is extracted by means ofdiscretization sampling and then stored, spatiotemporal data may beregarded as a temporal sequence set with spatial correlations, i.e., aspatiotemporal sequence. Data in the set may be considered as data withspatiotemporal sequence relationships. Specifically, all images having aspatiotemporal sequence relationship with the first image and the secondimage means that spatiotemporal data contained in all the images istemporally and spatially correlated with the spatiotemporal datacontained in the first image and spatiotemporal data contained in thesecond image, separately.

Generally, by using the first image as a path head node image and usingthe second image as a path tail node image, a prediction path with thefirst image as a head node and the second image as a tail node may begenerated according to the temporal information and the spatialinformation corresponding to all the images determined by means of thechain MRF, where the prediction path further corresponds to at least oneintermediate node in addition to the head node and the tail node.

When determining, from an acquired image set by means of a chain MRF,all images including information of the targets to be determined andhaving a spatiotemporal sequence relationship with the first image andthe second image, position information of all camera devices from astart position to an end position may be acquired by using a positioncorresponding to the spatial information of the first image as the startposition and using a position corresponding to the spatial informationof the second image as the end position; at least one device path may begenerated according to the relationships between positions indicated bythe position information of all the camera devices by using a cameradevice corresponding to the start position as a start point and using acamera device corresponding to the end position as an end point, whereeach device path further includes information of at least one othercamera device in addition to the camera device as the start point andthe camera device as the end point; and an image may be determined, fromimages captured by each of the other camera devices on the current path,for each device path by using time corresponding to the temporalinformation of the first image as start time and using timecorresponding to the temporal information of the second image as endtime, where the image includes the information of the targets to bedetermined, and has a set temporal sequence relationship with an imagewhich includes the information of the targets to be determined and iscaptured by a previous camera device adjacent to the current cameradevice.

Then, when generating a prediction path with the first image as a headnode and the second image as a tail node according to the temporalinformation and the spatial information corresponding to all thedetermined images, a plurality of connected intermediate nodes having aspatiotemporal sequence relationship may be generated for each devicepath according to the temporal sequence relationship of the determinedimages; an image path having a spatiotemporal sequence relationship andcorresponding to the current device path may be generated according tothe head node, the tail node, and the intermediate nodes; and a maximumprobability image path with the first image as the head node and thesecond image as the tail node may be determined from the image pathcorresponding to each device path as the prediction path of the targetsto be determined.

When determining, from the image path corresponding to each device path,a maximum probability image path with the first image as the head nodeand the second image as the tail node as the prediction path of thetargets to be determined, for the image path corresponding to eachdevice path, a probability of images of every two adjacent nodes in theimage path having information of the same target to be determined may beacquired; a probability of the image path being a prediction path of thetarget to be determined may be calculated according to the probabilityof the images of every two adjacent nodes in the image path having theinformation of the same target to be determined; and the maximumprobability image path may be determined as the prediction path of thetarget to be determined according to the probability of each image pathbeing a prediction path of the target to be determined.

By taking a vehicle being the target to be determined in this embodimentas an example, it is assumed that the travel route of the vehicle in aroad network is a chain MRF and each node on the chain is a camera, thevariable space of the node is a triple composed of images captured bythe camera, and the photographing times and locations of the images.Giving any pair of images requiring identification about whether theimages involve the same vehicle, and possible surveillance camerastherebetween are given (the possible surveillance cameras are a prioriinformation, and can be obtained by any appropriate way, such as bycollecting statistics on a data training set). Each pair of images ofadjacent cameras and spatiotemporal differences between the pair ofimages are input into a Siamese-CNN to calculate the probability ofvehicles in each pair of images captured by adjacent surveillancecameras in the road network being the same vehicle. The Siamese-CNN maybe regarded as a potential energy function between adjacent nodes in theMRF. The product value of the potential energy function may be minimized(optimized) by means of a Maximum Subsequence Sum (Max-Sum) algorithm toobtain a prediction path of the highest possibility. The prediction pathincludes the geographic location of a camera through which the vehiclepasses, the time at which the vehicle is photographed, and relatedinformation of the captured image.

For example, by setting p to represent information of the first image(including feature information, temporal information, and spatialinformation) and q to represent information of the second image(including feature information, temporal information, and spatialinformation), one way to determine the optimal path from a plurality ofpossible prediction paths by means of the chain MRF can be achieved bymaximizing the following formula (1):

$\begin{matrix}{P\left( {{{X\left. {{x_{1} = p},{x_{N} = q}} \right)} = {\frac{1}{z}{\varphi\left( {p,x_{2}} \right)}}},{{\varphi\left( {x_{N - 1},q} \right)}{\prod\limits_{l = 2}^{N - 2}{\varphi\left( {x_{l},x_{l + 1}} \right)}}}} \right.} & (1)\end{matrix}$

where P represents a prediction path (i.e., prediction path throughwhich a vehicle may pass), X represents cameras, N represents the numberof cameras on a prediction path, from X1 to XN, x₁ representsinformation of an image of the vehicle captured by X1, and so forth,x_(N) represents information of an image of the vehicle captured by XN,φ( ) represents potential energy function (i.e., output of theSiamese-CNN, which is a probability value between 0 and 1), φ(x_(i),x_(i+1)) represents potential energy function pair between x_(i) andx_(i+2), and x_(i) and x_(i+1) are considered to include information ofthe same vehicle. If x_(i) and x_(i+1) do include information of thesame vehicle, then for φ(x_(i),x_(i+1)), there will be a larger value,otherwise there will be a lower value.

When maximizing the formula (1), the time constraint described in theformula (2) may be used to make the formula (2) satisfy the formula (3):

$\begin{matrix}{X^{*} = {{\,{\,_{\mspace{65mu} X}^{\arg\mspace{14mu}\max}P}}\left( {X\left. {{x_{1} = p},{x_{N} = q}} \right)} \right.}} & (2) \\\begin{matrix}{t_{i,k_{i}^{*}} \leq t_{{i + 1},k_{i + 1}^{*}}} & {\forall{i \in \left( {1,{\ldots\ldots}\mspace{14mu},{N - 1}} \right)}}\end{matrix} & (3)\end{matrix}$

where t represents time, k_(i)* and k_(i+1)* respectively represent theoptimal selection of information of an image corresponding to x_(i) andthe optimal selection of information of an image corresponding tox_(i+1), X represents cameras, N represents the number of cameras on aprediction path, and from X1 to XN, x₁ represents information of animage of the vehicle captured by X1, and so forth, x_(N) representsinformation of an image of the vehicle captured by XN.

In the formulae (1), (2), and (3), the information of an image includestemporal information, spatial information, and feature information ofthe image.

Based on the formulae (1), (2), and (3), the formula (1) may beoptimized into the following formula (4) to obtain an optimal path,i.e., a maximum probability path through which the vehicle may pass.

$\begin{matrix}\begin{matrix}{{\,_{\mspace{20mu} X}^{\max}P}\left( {X\left. {{x_{1} = p},{x_{N} = q}} \right)} \right.} \\{{= {\frac{1}{Z}\varphi\left( {p,x_{2}} \right)}},{{\varphi\left( {x_{N - 1},q} \right)}_{\mspace{20mu} x_{2}}^{\max}\ldots\mspace{14mu}{\ldots\mspace{14mu}}_{x_{N - 1}^{\max}}{\prod\limits_{l = 2}^{N - 1}{\varphi\left( {x_{l},x_{l + 1}} \right)}}}} \\{= {\frac{1}{Z}{\,_{\mspace{14mu} x_{2}}^{\max}\left\lbrack {{\varphi\left( {p,x_{2}} \right)}{{\varphi\left( {x_{2},x_{3}} \right)}\left\lbrack {\ldots\mspace{14mu}{\ldots\mspace{14mu}}_{x_{N - 1}^{\max}}\mspace{14mu}{\varphi\left( {x_{N - 1},x_{q}} \right)}} \right\rbrack}\ldots\mspace{14mu}\ldots} \right\rbrack}}}\end{matrix} & (4)\end{matrix}$

By means of the process above, a prediction path through which thevehicle is most likely to pass may be determined.

For example, by using the first image as a prediction path head node Aand using the second image as a prediction path tail node D, accordingto the positional relationships between camera devices, possible drivingroutes of the vehicle include: route 1: A->B->C->D; route 2: A->E->D;and route 3: A->F->G->H->D. It is determined after the calculation basedon the formula (4) that the probability of route 1 is 85%, theprobability of route 2 is 95%, and the probability of route 3 is 70%.Then route 2 can be determined as the prediction path of the vehicle.

It should be noted that the process above is exemplified by a chain MRF.However, in practical applications, persons skilled in the art may alsouse other appropriate ways to generate the prediction path of thetargets to be determined. For example, background information of thefirst image and the second image is detected based on a depth neuralnetwork to generate the prediction path of the targets to be determined.

In step S206, validity determination is performed on the prediction pathby means of a neural network and whether the targets to be determined inthe first image and the second image are the same target to bedetermined is determined according to a determination result.

The neural network is any appropriate neural network that can implementfeature extraction or target object identification, including, but notlimited to, a Convolutional Neural Network (CNN), a reinforcementlearning neural network, a generative network in generative adversarialnetworks, and the like. The specific structure in the neural network maybe appropriately configured by persons skilled in the art according toactual needs, such as the number of convolutional layers, the size ofthe convolution kernel, the number of channels, and the like, and arenot limited in the embodiments of the present disclosure.

In a specific implementation, the neural network is an LSTM. LSTM is atime recurrent neural network which is a variant of a Recurrent NeuralNetwork (RNN), and is good at processing sequence information. In theembodiments of the present disclosure, the prediction path of thevehicle may also be considered as sequence information, and can beprocessed by means of an LSTM to determine the validity of theprediction path.

The validity determination is determination of a possibility whether aprediction path will be the travel route of the same target to bedetermined. The higher the possibility is, the higher the possibility ofthe targets to be determined in the first image and the second imagebeing the same target to be determined is.

In this embodiment, the temporal difference between adjacent images inthe prediction path may be acquired according to temporal information ofthe adjacent images; the spatial difference between the adjacent imagesmay be acquired according to spatial information of the adjacent images;the feature difference between the targets to be determined in theadjacent images may be acquired according to feature information of thetargets to be determined in the adjacent images; the obtained temporaldifference, spatial difference, and feature difference between theadjacent images in the prediction path may be input into an LSTM toobtain an identification probability of the targets to be determined inthe prediction path; and whether the targets to be determined in thefirst image and the second image are the same target to be determinedmay be determined according to the identification probability of thetargets to be determined in the prediction path. The specificdetermination criteria for whether the targets to be determined are thesame target to be determined may be appropriately configured by personsskilled in the art according to actual needs, and are not limited in theembodiments of the present disclosure.

The temporal difference between the adjacent images may be obtained byperforming subtraction on the temporal information of the two images,the spatial difference between the adjacent images may be obtained bycalculating the distance between the locations where the two images arecaptured, and the feature difference between the adjacent images may beobtained by performing subtraction on feature vectors of the two images.In a feasible implementation, when obtaining the feature differencebetween adjacent images, a Siamese-CNN may be utilized, and featureinformation of the targets to be determined in the adjacent images isseparately acquired by means of the Siamese-CNN; and the featuredifference between the targets to be determined in the adjacent imagesis acquired according to the separately acquired feature information.The Siamese-CNN in this step may be the same as or different from theSiamese-CNN in step S204.

In this embodiment, after a travel route between any two vehicle imagesis obtained by means of the MRF, it is necessary to determine whetherthe travel route is valid, that is, to perform validity determination.Valid means that the travel route is a route that the same vehicle willtravel, otherwise, the travel route is an invalid route. In thisembodiment, the determination mode adopted is using an LSTM fordetermination. Inputs of the LSTM are the temporal difference (i.e.,temporal difference), the distance difference (i.e., spatialdifference), and the appearance difference (i.e., feature difference)between adjacent nodes on the route. As stated above, the appearancedifference may be obtained by directly performing subtraction on featurevectors output after inputting two images to the Siamese-CNN. Output ofthe LSTM is a probability value by which the validity of the predictionpath can be determined to determine whether the vehicles in the twoimages are the same vehicle.

In view of the above, by means of this embodiment, a prediction paththrough which the targets to be determined in the first image and asecond image may pass is generated based on spatiotemporal informationand feature information contained in the images; and whether the targetsto be determined in the first image and the second image are the same isdetermined by performing validity determination on the prediction path.The validity determination is determination of a possibility whether thecurrent prediction path will be the travel route of the same target tobe determined. The higher the possibility is, the higher the possibilityof the targets to be determined in the first image and the second imagebeing the same target to be determined is. Thus, whether targets to bedetermined in different images are the same target to be determined canbe detected and identified more accurately.

The method for identifying a target in this embodiment is performed byany appropriate device having image or data processing capabilities,including but not limited to: a camera, a terminal, a mobile terminal, aPC, a server, an in-vehicle device, an entertainment device, anadvertising device, a PDA, a tablet computer, a laptop computer, ahandheld game console, smart glasses, a smart watch, a wearable device,a virtual display device, or a display enhancement device (such asGoogle Glass, Oculus Rift, Hololens, Gear VR), and the like.

Embodiment III

Referring to FIG. 3, a schematic flowchart of a method for identifying atarget according to Embodiment III of the present disclosure is shown.

In this embodiment, the method for identifying a target in theembodiments of the present disclosure is described by taking a vehiclebeing a target to be determined as an example. However, persons skilledin the art should understand that in practical application,corresponding target identification operations can be implemented forother targets to be determined with reference to this embodiment.

The method for identifying a target in this embodiment includes thefollowing steps:

In step S302, a preliminary sameness probability value of the targets tobe determined respectively contained in the first image and the secondimage is determined according to temporal information, spatialinformation, and image feature information of the first image andtemporal information, spatial information, and image feature informationof the second image.

The first image and the second image each include information of atarget to be determined.

In the embodiments of the present disclosure, the first image and thesecond image have a spatiotemporal sequence relationship, and eachincludes information of a corresponding target to be determined. Basedon a comprehensive consideration of the temporal information, thespatial information, and the image feature information of the images,persons skilled in the art may preliminarily determine a preliminarysameness probability value of the targets to be determined in the twoimages by any appropriate method.

In a feasible solution, a preliminary sameness probability value of thetargets to be determined respectively contained in the first image andthe second image may be obtained by using a Siamese-CNN.

A Siamese-CNN is a CNN having at least two branches, and may receivemultiple inputs simultaneously and output the similarity between themultiple inputs (which can be expressed in the form of probability).Taking double branches as an example, two images can be simultaneouslyinput to the Siamese-CNN by means of the double branches, and theSiamese-CNN will output the similarity between the two images, or outputa determination result concerning whether the two images are similar.The Siamese-CNN in this embodiment includes three branches, where twobranches are configured to receive input images, and the other branch isconfigured to receive the input difference in temporal information(temporal difference) and difference in spatial information (spatialdifference) between the two images. By detecting the input images, thesimilarity in feature (such as appearance similarity) between targetobjects (which are vehicles in this embodiment) in the images is output,and by detecting the input difference in temporal information anddifference in spatial information, the similarity in time and spacebetween the target objects in the images is output. According to thesimilarity in the two aspects, the preliminary sameness probabilityvalue of the target objects in the images, such as the vehicles in thisembodiment, may be further determined.

In view of the above, in this embodiment, the first image and the secondimage, and the difference in temporal information and the difference inspatial information between the first image and the second image may beinput into a Siamese-CNN to obtain a preliminary sameness probabilityvalue of the targets to be determined in the first image and the secondimage. After the preliminary sameness probability value is obtained, itis preliminarily determined according to the preliminary samenessprobability value that the first image and the second image include thesame target to be determined. Specifically, the preliminary samenessprobability value is compared with a preset value, if the preliminarysameness probability value is less than or equal to the preset value, itis determined that the first image and the second image do not includethe same target to be determined, and if the preliminary samenessprobability value is greater than the preset value, it is preliminarilydetermined that the first image and the second image include the sametarget to be determined. The preset value may be appropriately set bypersons skilled in the art according to actual conditions, and is notlimited in the embodiments of the present disclosure.

The Siamese-CNN can effectively determine the similarity between targetobjects, such as vehicles, in two images having spatiotemporalinformation, but the present disclosure is not limited to Siamese-CNN.Other ways or neural networks that have similar functions or that canachieve the same purpose are also applicable to the solutions in theembodiments of the present disclosure.

In step S304, a prediction path is generated based on the first imageand the second image if the preliminary sameness probability value isgreater than a preset value.

Compared with the pedestrian's travel route, the travel routes of thetargets to be determined, such as vehicles, are more stable and moreregular. Therefore, the travel routes of the vehicles may be predictedby using the feature information of the vehicles (which can characterizethe appearances of the vehicles) together with the spatiotemporalinformation, and the reliability of vehicle re-identification can beenhanced by means of the route prediction results.

As stated above, the first image and the second image are images havinga spatiotemporal sequence relationship. On this basis, it is necessaryto further find possible travel routes of the vehicles in the images,where images of the vehicles captured on the travel routes should have aspatiotemporal sequence relationship with the first image and the secondimage.

In a specific implementation, the prediction path of the targets to bedetermined is generated by means of an MRF according to the informationof the first image and the information of the second image. The specificimplementation process is similar to that in step S204 in the foregoingEmbodiment II, and details are not described herein again.

In step S306, validity determination is performed on the predictionpath, and whether the targets to be determined in the first image andthe second image are the same target to be determined is re-identifiedaccording to a determination result.

The validity determination is determination of a possibility whether aprediction path will be the travel route of the same target to bedetermined. The higher the possibility is, the higher the possibility ofthe targets to be determined in the first image and the second imagebeing the same target to be determined is.

For example, in some cases, it is possible that the preliminarydetermination result itself is wrong, that is, the vehicle in the firstimage and the vehicle in the second image may not be the same vehiclebut are misidentified as the same vehicle. If the two vehicles are notthe same vehicle, the probability of the two vehicles having the samedriving route within a possible reasonable time range is particularlylow, causing that the validity of the prediction path determinedaccording to the information of the first image and the information ofthe second image is also low. Thus, whether the vehicles in the firstimage and the second image is the same vehicle can be re-determined andre-identified.

In a specific implementation, validity determination is performed on theprediction path by means of an LSTM, and whether the targets to bedetermined in the first image and the second image are the same targetto be determined is re-identified according to the determination result.The specific implementation process is similar to that in step S206 inthe foregoing Embodiment II, and details are not described herein again.

According to the method for identifying a target provided by thisembodiment, on the basis of preliminarily determining that the targetsto be determined respectively contained in the first image and thesecond image are the same, a prediction path through which the targetsto be determined may pass is determined; then, whether the preliminarydetermination result is correct is determined by means of validitydetermination of the prediction path, so as to re-identify whether thetargets to be determined in the first image and the second image are thesame target to be determined. The validity determination isdetermination of a possibility whether the current prediction path willbe the travel route of the same target to be determined. The higher thepossibility is, the higher the possibility of the targets to bedetermined in the first image and the second image being the same targetto be determined is. Thus, whether targets to be determined in differentimages are the same target to be determined can be re-detected andre-identified more accurately.

The method for identifying a target in this embodiment is performed byany appropriate device having image or data processing capabilities,including but not limited to: a camera, a terminal, a mobile terminal, aPC, a server, an in-vehicle device, an entertainment device, anadvertising device, a PDA, a tablet computer, a laptop computer, ahandheld game console, smart glasses, a smart watch, a wearable device,a virtual display device, or a display enhancement device (such asGoogle Glass, Oculus Rift, Hololens, Gear VR), and the like.

Embodiment IV

FIG. 4 is a schematic structural diagram showing an apparatus foridentifying a target according to Embodiment IV of the presentdisclosure, based on the same technical concept. The apparatus foridentifying a target can be configured to execute the method foridentifying a target according to Embodiment I.

Referring to FIG. 4, the apparatus for identifying a target includes anacquisition module 401, a generation module 402, and a firstdetermination module 403.

The acquisition module 401 is configured to acquire a first image and asecond image, the first image and the second image each including atarget to be determined;

the generation module 402 is configured to generate a prediction pathbased on the first image and the second image, both ends of theprediction path respectively corresponding to the first image and thesecond image; and

the first determination module 403 is configured to perform validitydetermination on the prediction path and determine, according to adetermination result, whether the targets to be determined in the firstimage and the second image are the same target to be determined.

By means of the apparatus for identifying a target provided by thisembodiment, a prediction path through which the targets to be determinedmay pass is generated based on information contained in the first imageand the second image; and whether the targets to be determined in thefirst image and the second image are the same is determined byperforming validity determination on the prediction path. The validitydetermination is determination of a possibility whether the currentprediction path will be the travel route of the same target to bedetermined. The higher the possibility is, the higher the possibility ofthe targets to be determined in the first image and the second imagebeing the same target to be determined is. Thus, whether targets to bedetermined in different images are the same target to be determined canbe detected and identified more accurately.

Embodiment V

FIG. 5 is a schematic structural diagram showing an apparatus foridentifying a target according to Embodiment V of the presentdisclosure, based on the same technical concept. The targetidentification apparatus can be configured to execute the method foridentifying a target according to Embodiment II.

Referring to FIG. 5, the apparatus for identifying a target includes anacquisition module 501, a generation module 502, and a firstdetermination module 503. The acquisition module 501 is configured toacquire a first image and a second image, the first image and the secondimage each including a target to be determined; the generation module502 is configured to generate a prediction path based on the first imageand the second image, both ends of the prediction path respectivelycorresponding to the first image and the second image; and the firstdetermination module 503 is configured to perform validity determinationon the prediction path and determine, according to a determinationresult, whether the targets to be determined in the first image and thesecond image are the same target to be determined.

In one embodiment, the generation module 502 includes: a secondgeneration sub-module 5021 configured to generate the prediction path ofthe targets to be determined by means of a probability model accordingto the feature information of the first image, the temporal informationof the first image, the spatial information of the first image, thefeature information of the second image, the temporal information of thesecond image, and the spatial information of the second image.

In one embodiment, the second generation sub-module 5021 includes:

a first determination unit 5022 configured to determine, by means of anMRF, all images including information of the targets to be determinedand having a spatiotemporal sequence relationship with the first imageand the second image from an acquired image set; and a first generationunit 5023 configured to generate, according to temporal information andspatial information corresponding to all the determined images, theprediction path of the targets to be determined.

In one embodiment, the first generation unit 5023 includes: a secondgeneration unit 5024 configured to generate a prediction path with thefirst image as a head node and the second image as a tail node accordingto the temporal information and the spatial information corresponding toall the determined images, where the prediction path further correspondsto at least one intermediate node in addition to the head node and thetail node.

In one embodiment, the first determination unit 5022 is configured to:acquire position information of all camera devices from a start positionto an end position by using a position corresponding to the spatialinformation of the first image as the start position and using aposition corresponding to the spatial information of the second image asthe end position; generate, according to the relationships betweenpositions indicated by the position information of all the cameradevices, at least one device path by using a camera device correspondingto the start position as a start point and using a camera devicecorresponding to the end position as an end point, where each devicepath further includes information of at least one other camera device inaddition to the camera device as the start point and the camera deviceas the end point; and determine, from images captured by each of theother camera devices on the current path, an image for each device pathby using time corresponding to the temporal information of the firstimage as start time and using time corresponding to the temporalinformation of the second image as end time, wherein the image includesthe information of the targets to be determined, and has a set temporalsequence relationship with an image which includes the information ofthe targets to be determined and is captured by a previous camera deviceadjacent to the current camera device.

In one embodiment, the second generation unit 5024 is configured to:generate, according to the temporal sequence relationship of thedetermined images, a plurality of connected intermediate nodes having aspatiotemporal sequence relationship for each device path; generate,according to the head node, the tail node, and the intermediate nodes,an image path having a spatiotemporal sequence relationship andcorresponding to the current device path; and determine, from the imagepath corresponding to each device path, a maximum probability image pathwith the first image as the head node and the second image as the tailnode as the prediction path of the targets to be determined.

In one embodiment, the second generation unit 5024 is further configuredto: acquire, for the image path corresponding to each device path, aprobability of images of every two adjacent nodes in the image pathhaving information of the same target to be determined; calculate,according to the probability of the images of every two adjacent nodesin the image path having the information of the same target to bedetermined, a probability of the image path being a prediction path ofthe target to be determined; and determine, according to the probabilityof each image path being a prediction path of the target to bedetermined, the maximum probability image path as the prediction path ofthe target to be determined.

In one embodiment, the first determination module 503 includes: a seconddetermination sub-module 5031 configured to perform validitydetermination on the prediction path by means of a neural network anddetermine, according to the determination result, whether the targets tobe determined in the first image and the second image are the sametarget to be determined.

In one embodiment, the second determination sub-module 5031 includes: afirst acquisition unit 5032 configured to acquire the temporaldifference between adjacent images in the prediction path according totemporal information of the adjacent images, acquire the spatialdifference between the adjacent images according to spatial informationof the adjacent images, and acquire the feature difference between thetargets to be determined in the adjacent images according to featureinformation of the targets to be determined in the adjacent images; anda second acquisition unit 5033 configured to input the obtained temporaldifference, spatial difference, and feature difference between theadjacent images in the prediction path into an LSTM to obtain anidentification probability of the targets to be determined in theprediction path; and a second determination unit 5034 configured todetermine, according to the identification probability of the targets tobe determined in the prediction path, whether the targets to bedetermined in the first image and the second image are the same targetto be determined.

In one embodiment, the first acquisition unit 5032 is configured to:separately acquire feature information of the targets to be determinedin the adjacent images by means of the Siamese-CNN; and acquire thefeature difference between the targets to be determined in the adjacentimages according to the separately acquired feature information.

It should be noted that more specific details of the apparatus foridentifying a target provided by the embodiments of the presentdisclosure have been described in detail in the method for identifying atarget provided by the embodiments of the present disclosure, and thedetails are not described herein again.

Embodiment VI

FIG. 6 is a schematic structural diagram showing an apparatus foridentifying a target according to Embodiment VI of the presentdisclosure, based on the same technical concept. The apparatus foridentifying a target can be configured to execute the method foridentifying a target according to Embodiment III.

Referring to FIG. 6, the apparatus for identifying a target includes anacquisition module 601, a generation module 603, and a firstdetermination module 604. The acquisition module 601 is configured toacquire a first image and a second image, the first image and the secondimage each including a target to be determined; the generation module603 is configured to generate a prediction path based on the first imageand the second image, both ends of the prediction path respectivelycorresponding to the first image and the second image; and the firstdetermination module 604 is configured to perform validity determinationon the prediction path and determine, according to a determinationresult, whether the targets to be determined in the first image and thesecond image are the same.

In one embodiment, the target to be determined is a vehicle.

In one embodiment, the apparatus further includes: a seconddetermination module 602 configured to determine, according to temporalinformation, spatial information, and image feature information of thefirst image and temporal information, spatial information, and imagefeature information of the second image, a preliminary samenessprobability value of the targets to be determined respectively containedin the first image and the second image; and correspondingly, thegeneration module 603 includes: a first generation sub-module 6031configured to generate a prediction path based on the first image andthe second image if the preliminary sameness probability value isgreater than a preset value.

In one embodiment, the second determination module 602 includes: a firstdetermination sub-module 6021 configured to input the first image, thesecond image, and a difference in temporal information and a differencein spatial information between the first image and the second image intoa Siamese-CNN to obtain a preliminary sameness probability value of thetargets to be determined in the first image and the second image.

It should be noted that more specific details of the apparatus foridentifying a target provided by the embodiments of the presentdisclosure have been described in detail in the method for identifying atarget provided by the embodiments of the present disclosure, and thedetails are not described herein again.

Embodiment VII

Embodiment VII of the present disclosure provides an electronic devicewhich, for example, may be a mobile terminal, a PC, a tablet computer, aserver, and the like. Referring to FIG. 7 below, a schematic structuraldiagram of an electronic device 700, which may be a terminal device or aserver, suitable for implementing the embodiments of the presentdisclosure is shown. As shown in FIG. 7, the electronic device 700includes one or more processors, a communication element, and the like.The one or more processors are, for example, one or more CentralProcessing Units (CPUs) 701 and/or one or more Graphic Processing Units(GPUs) 713, and may execute appropriate actions and processing accordingto executable instructions stored in a Read-Only Memory (ROM) 702 orexecutable instructions loaded from a storage section 708 to aRandom-Access Memory (RAM) 703. The communication element includes acommunication component 712 and/or a communication interface 709. Thecommunication component 712 may include, but not limited to, a networkcard and the network card may include, but not limited to, an InfiniBand(IB) network card. The communication interface 709 includes acommunication interface of a network interface card such as a LAN cardand a modem, and the communication interface 709 performs communicationprocessing via a network such as the Internet.

The processor may communicate with the ROM 702 and/or the RAM 703 toexecute executable instructions, is connected to the communicationcomponent 712 by means of a bus 704, and communicates with other targetdevices by means of the communication component 712, so as to completecorresponding operations of any of the method for identifying a targetprovided by the embodiments of the present disclosure, for example,acquiring a first image and a second image, where the first image andthe second image each include a target to be determined, generating aprediction path based on the first image and the second image, whereboth ends of the prediction path respectively correspond to the firstimage and the second image, and performing validity determination on theprediction path and determining, according to the determination result,whether the targets to be determined in the first image and the secondimage are the same.

In addition, the RAM 703 may further store various programs and datarequired for operations of the apparatuses. The CPU 701 or GPU 713, theROM 702, and the RAM 703 are connected to each other by means of thecommunication bus 704. In the presence of the RAM 703, the ROM 702 is anoptional module. The RAM 703 stores executable instructions, or writesthe executable instructions to the ROM 702 during running. Theexecutable instructions cause the processor to execute correspondingoperations of the foregoing communication method. An Input/Output (I/O)interface 705 is also connected to the communication bus 704. Thecommunication component 712 may be an integrated component, or mayinclude multiple sub-modules (e.g., multiple IB network cards), and islinked with the communication bus.

The following components are connected to the I/O interface 705: aninput section 706 including a keyboard, a mouse and the like; an outputsection 707 including a Cathode-Ray Tube (CRT), a Liquid Crystal Display(LCD), a loudspeaker and the like; the storage section 708 includinghardware and the like; and the communication interface 709 of a networkinterface card such as a LAN card and a modem. A drive 710 is alsoconnected to the I/O interface 705 according to needs. A removablemedium 711 such as a magnetic disk, an optical disk, a magneto-opticaldisk, and a semiconductor memory is installed on the drive 710 accordingto needs, to cause a computer program read from the removable medium 711to be installed into the storage section 708 according to needs.

It should be noted that the architecture illustrated in FIG. 7 is merelyan optional implementation mode. During specific practice, the numberand types of the components in FIG. 7 may be selected, decreased,increased, or replaced according to actual requirements. Differentfunctional components may be separated or integrated or the like. Forexample, the GPU and the CPU may be separated, or the GPU may beintegrated on the CPU, and the communication element may be separatedfrom or integrated on the CPU or the GPU or the like. These alternativeimplementations all fall within the scope of protection of the presentdisclosure.

Particularly, the process described above with reference to theflowchart according to an embodiment of the present disclosure may beimplemented as a computer software program. For example, an embodimentof the present disclosure includes a computer program product, whichincludes a computer program tangibly contained in a machine-readablemedium. The computer program includes a program code for executing amethod shown in the flowchart. The program code may includecorresponding instructions for correspondingly executing steps of themethod provided by an embodiment of the present disclosure, for example,acquiring a first image and a second image, the first image and thesecond image each including a target to be determined, generating aprediction path based on the first image and the second image, both endsof the prediction path respectively corresponding to the first image andthe second image, and performing validity determination on theprediction path and determining, according to a determination result,whether the targets to be determined in the first image and the secondimage are the same. In this embodiment, the computer program may bedownloaded from a network by means of the communication element andinstalled, and/or be installed from the removable medium 711. When thecomputer program is executed by the processor, the functions defined inthe method according to an embodiment of the present disclosure areexecuted.

It should be noted that according to needs for implementation, thecomponents/steps described in the embodiments of the present disclosuremay be split into more components/steps, and two or morecomponents/steps or some operations of the components/steps may also becombined into new components/steps to achieve the purpose of theembodiments of the present disclosure.

The foregoing methods according to the embodiments of the presentdisclosure may be implemented in hardware or firmware, or implemented assoftware or computer codes stored in a recording medium (such as a CDROM, RAM, floppy disk, hard disk, or magneto-optical disk), orimplemented as computer codes that can be downloaded by means of anetwork and are originally stored in a remote recording medium or anon-volatile machine-readable medium and will be stored in a localrecording medium; accordingly, the methods described herein may behandled by software stored in a medium using a general-purpose computer,a special-purpose processor, or programmable or dedicated hardware (suchas ASIC or FPGA). As can be understood, a computer, a processor, amicroprocessor controller or programmable hardware includes a storagecomponent (e.g., RAM, ROM, flash memory, etc.) that can store or receivesoftware or computer codes, when the software or computer codes areaccessed and executed by the computer, processor or hardware, theprocessing method described herein is carried out. In addition, when ageneral-purpose computer accesses codes that implements the processesshown herein, the execution of the codes will convert thegeneral-purpose computer to a special-purpose computer for executing theprocesses shown herein.

Persons of ordinary skill in the art can understand that the individualexemplary units and arithmetic steps that are described in conjunctionwith the embodiments disclosed herein are able to be implemented inelectronic hardware, or a combination of computer software andelectronic hardware. Whether these functions are implemented in hardwareor software is determined by the specific applications and designconstraint conditions of the technical solution. For each specificapplication, the described functions can be implemented by personsskilled in the art using different methods, but this implementationshould not be considered to go beyond the scope of the embodiments ofthe present disclosure.

The above implementations are merely intended to describe theembodiments of the present disclosure, and are not intended to limit theembodiments of the present disclosure. Persons of ordinary skill in theart may make various variations and modifications without departing fromthe spirit and scope of the embodiments of the present disclosure.Therefore, all equivalent technical solutions also fall within the scopeof the embodiments of the present disclosure, and the patent protectionscope of the embodiments of the present disclosure shall be limited bythe claims.

What is claimed is:
 1. A method for identifying a target, comprising: acquiring a first image and a second image, the first image and the second image each comprising a target to be determined; generating a prediction path based on the first image and the second image, both ends of the prediction path respectively corresponding to targets to be determined in the first image and the second image; and performing validity determination on the prediction path, and determining, according to a determination result, whether the targets to be determined in the first image and the second image are the same target to be determined, wherein before generating the prediction path based on the first image and the second image, the method further comprises: determining a preliminary sameness probability value of the targets to be determined respectively contained in the first image and the second image according to temporal information, spatial information, and image feature information of the first image and temporal information, spatial information, and image feature information of the second image; wherein generating the prediction path based on the first image and the second image comprises: generating the prediction path based on the first image and the second image if the preliminary sameness probability value is greater than a preset value.
 2. The method according to claim 1, wherein determining the preliminary sameness probability value of the targets to be determined respectively contained in the first image and the second image according to the temporal information, the spatial information, and the image feature information of the first image and the temporal information, the spatial information, and the image feature information of the second image comprises: inputting the first image, the second image, and a difference in temporal information and a difference in spatial information between the first image and the second image into a Siamese Convolutional Neural Network (Siamese-CNN) to obtain the preliminary sameness probability value of the targets to be determined in the first image and the second image.
 3. The method according to claim 1, wherein performing the validity determination on the prediction path and determining, according to the determination result, whether the targets to be determined in the first image and the second image are the same target to be determined comprises: performing, through a neural network, validity determination on the prediction path and determining, according to the determination result, whether the targets to be determined in the first image and the second image are the same target to be determined.
 4. The method according to claim 1, wherein performing, through the neural network, the validity determination on the prediction path and determining whether the targets to be determined in the first image and the second image are the same target to be determined according to the determination result comprises: acquiring a temporal difference between adjacent images in the prediction path according to temporal information of the adjacent images; acquiring a spatial difference between the adjacent images according to spatial information of the adjacent images; and acquiring a feature difference between the targets to be determined in the adjacent images according to feature information of the targets to be determined in the adjacent images; inputting the obtained temporal difference, spatial difference, and feature difference between the adjacent images in the prediction path into a Long Short-Term Memory (LSTM) network to obtain an identification probability of the targets to be determined in the prediction path; and determining, according to the identification probability of the targets to be determined in the prediction path, whether the targets to be determined in the first image and the second image are the same target to be determined.
 5. The method according to claim 4, wherein acquiring the feature difference between the targets to be determined in the adjacent images according to the feature information of the targets to be determined in the adjacent images comprises: separately acquiring feature information of the targets to be determined in the adjacent images through the Siamese-CNN; and acquiring the feature difference between the targets to be determined in the adjacent images according to the separately acquired feature information.
 6. An apparatus for identifying a target, comprising: a processor; and a memory for storing instructions executable by the processor; wherein the processor is configured to: acquire a first image and a second image, the first image and the second image each comprising a target to be determined; generate a prediction path based on the first image and the second image, both ends of the prediction path respectively corresponding to targets to be determined in the first image and the second image; and perform validity determination on the prediction path and determine, according to a determination result, whether the targets to be determined in the first image and the second image are the same target to be determined, wherein the processor is further configured to: before generating the prediction path based on the first image and the second image, determine, according to temporal information, spatial information, and image feature information of the first image and temporal information, spatial information, and image feature information of the second image, a preliminary sameness probability value of the targets to be determined respectively contained in the first image and the second image; and wherein the operation of generating the prediction path based on the first image and the second image comprises: generating the prediction path based on the first image and the second image if the preliminary sameness probability value is greater than a preset value.
 7. The apparatus according to claim 6, wherein the processor is specifically configured to: input the first image, the second image, and a difference in temporal information and a difference in spatial information between the first image and the second image into a Siamese Convolutional Neural Network (Siamese-CNN) to obtain a preliminary sameness probability value of the targets to be determined in the first image and the second image.
 8. The apparatus according to claim 7, wherein the processor is specifically configured to: perform validity determination on the prediction path through a neural network and determine, according to a determination result, whether the targets to be determined in the first image and the second image are the same target to be determined.
 9. The apparatus according to claim 8, wherein the operation of performing the validity determination on the prediction path through the neural network and determine, according to the determination result, whether the targets to be determined in the first image and the second image are the same target to be determined comprises: acquiring a temporal difference between adjacent images in the prediction path according to temporal information of the adjacent images; acquire a spatial difference between the adjacent images according to spatial information of the adjacent images; and acquire a feature difference between the targets to be determined in the adjacent images according to feature information of the targets to be determined in the adjacent images; inputting the obtained temporal difference, spatial difference, and feature difference between the adjacent images in the prediction path into a Long Short-Term Memory (LSTM) network to obtain an identification probability of the targets to be determined in the prediction path; and determining, according to the identification probability of the targets to be determined in the prediction path, whether the targets to be determined in the first image and the second image are the same target to be determined.
 10. The apparatus according to claim 9, wherein the operation of acquiring the feature difference between the targets to be determined in the adjacent images according to the feature information of the targets to be determined in the adjacent images comprises: separately acquiring feature information of the targets to be determined in the adjacent images through the Siamese-CNN; and acquiring the feature difference between the targets to be determined in the adjacent images according to the separately acquired feature information.
 11. A non-transitory computer-readable storage medium, having computer program instructions stored thereon, wherein the program instructions, when being executed by a processor, are configured to perform the operations of: acquiring a first image and a second image, the first image and the second image each comprising a target to be determined; generating a prediction path based on the first image and the second image, both ends of the prediction path respectively corresponding to targets to be determined in the first image and the second image; and performing validity determination on the prediction path, and determining, according to a determination result, whether the targets to be determined in the first image and the second image are the same target to be determined, wherein before generating the prediction path based on the first image and the second image, the program instructions being executed by the processor, are further configured to perform the operation of: determining a preliminary sameness probability value of the targets to be determined respectively contained in the first image and the second image according to temporal information, spatial information, and image feature information of the first image and temporal information, spatial information, and image feature information of the second image; wherein the operation of generating the prediction path based on the first image and the second image comprises: generating the prediction path based on the first image and the second image if the preliminary sameness probability value is greater than a preset value.
 12. The non-transitory computer-readable storage medium according to claim 11, wherein the operation of determining the preliminary sameness probability value of the targets to be determined respectively contained in the first image and the second image according to the temporal information, the spatial information, and the image feature information of the first image and the temporal information, the spatial information, and the image feature information of the second image comprises: inputting the first image, the second image, and a difference in temporal information and a difference in spatial information between the first image and the second image into a Siamese Convolutional Neural Network (Siamese-CNN) to obtain the preliminary sameness probability value of the targets to be determined in the first image and the second image.
 13. The non-transitory computer-readable storage medium according to claim 11, wherein the operation of performing the validity determination on the prediction path and determining, according to the determination result, whether the targets to be determined in the first image and the second image are the same target to be determined comprises: performing, through a neural network, validity determination on the prediction path and determining, according to the determination result, whether the targets to be determined in the first image and the second image are the same target to be determined.
 14. The non-transitory computer-readable storage medium according to claim 11, wherein the operation of performing, through the neural network, the validity determination on the prediction path and determining whether the targets to be determined in the first image and the second image are the same target to be determined according to the determination result comprises: acquiring a temporal difference between adjacent images in the prediction path according to temporal information of the adjacent images; acquiring a spatial difference between the adjacent images according to spatial information of the adjacent images; and acquiring a feature difference between the targets to be determined in the adjacent images according to feature information of the targets to be determined in the adjacent images; inputting the obtained temporal difference, spatial difference, and feature difference between the adjacent images in the prediction path into a Long Short-Term Memory (LSTM) network to obtain an identification probability of the targets to be determined in the prediction path; and determining, according to the identification probability of the targets to be determined in the prediction path, whether the targets to be determined in the first image and the second image are the same target to be determined.
 15. The non-transitory computer-readable storage medium according to claim 14, wherein the operation of acquiring the feature difference between the targets to be determined in the adjacent images according to the feature information of the targets to be determined in the adjacent images comprises: separately acquiring feature information of the targets to be determined in the adjacent images through the Siamese-CNN; and acquiring the feature difference between the targets to be determined in the adjacent images according to the separately acquired feature information. 